# Distillation for quantization on Textual Inversion models to personalize text2image

[Textual inversion](https://arxiv.org/abs/2208.01618) is a method to personalize text2image models like stable diffusion on your own images._By using just 3-5 images new concepts can be taught to Stable Diffusion and the model personalized on your own images_
The `textual_inversion.py` script shows how to implement the training procedure and adapt it for stable diffusion.
We have enabled distillation for quantization in `textual_inversion.py` to do quantization aware training as well as distillation on the model generated by Textual Inversion method.

## Installing the dependencies

Before running the scripts, make sure to install the library's training dependencies:

```bash
pip install -r requirements.txt
```

## Prepare Datasets

One picture which is from the huggingface datasets [sd-concepts-library/dicoo2](https://huggingface.co/sd-concepts-library/dicoo2) is needed, and save it to the `./dicoo` directory. The picture show below:

<a href="https://huggingface.co/sd-concepts-library/dicoo2/blob/main/concept_images/1.jpeg">
    <img src="https://huggingface.co/sd-concepts-library/dicoo2/resolve/main/concept_images/1.jpeg" width = "300" height="300">
</a>

## Get a FP32 Textual Inversion model

Use the following command to fine-tune the Stable Diffusion model on the above dataset to obtain the FP32 Textual Inversion model.

```bash
export MODEL_NAME="CompVis/stable-diffusion-v1-4"
export DATA_DIR="./dicoo"

accelerate launch textual_inversion.py \
  --pretrained_model_name_or_path=$MODEL_NAME \
  --train_data_dir=$DATA_DIR \
  --learnable_property="object" \
  --placeholder_token="<dicoo>" --initializer_token="toy" \
  --resolution=512 \
  --train_batch_size=1 \
  --gradient_accumulation_steps=4 \
  --max_train_steps=3000 \
  --learning_rate=5.0e-04 --scale_lr \
  --lr_scheduler="constant" \
  --lr_warmup_steps=0 \
  --output_dir="dicoo_model"
```

## Do distillation for quantization

Once you have the FP32 Textual Inversion model, the following command will take the FP32 Textual Inversion model as input to do distillation for quantization and generate the INT8 Textual Inversion model.

```bash
export FP32_MODEL_NAME="./dicoo_model"
export DATA_DIR="./dicoo"

accelerate launch textual_inversion.py \
  --pretrained_model_name_or_path=$FP32_MODEL_NAME \
  --train_data_dir=$DATA_DIR \
  --use_ema --learnable_property="object" \
  --placeholder_token="<dicoo>" --initializer_token="toy" \
  --resolution=512 \
  --train_batch_size=1 \
  --gradient_accumulation_steps=4 \
  --max_train_steps=300 \
  --learning_rate=5.0e-04 --max_grad_norm=3 \
  --lr_scheduler="constant" \
  --lr_warmup_steps=0 \
  --output_dir="int8_model" \
  --do_quantization --do_distillation --verify_loading
```

## Inference

Once you have trained a INT8 model with the above command, the inference can be done simply using the `text2images.py` script. Make sure to include the `placeholder_token` in your prompt.

```bash
python text2images.py \
  --pretrained_model_name_or_path=$INT8_MODEL_NAME \
  --caption "a lovely <dicoo> in red dress and hat, in the snowly and brightly night, with many brighly buildings." \
  --images_num 4
```

Here is the comparison of images generated by the FP32 model (left) and INT8 model (right) respectively:

<p float="left">
  <img src="https://huggingface.co/datasets/Intel/textual_inversion_dicoo_dfq/resolve/main/FP32.png" width = "300" height = "300" alt="FP32" align=center />
  <img src="https://huggingface.co/datasets/Intel/textual_inversion_dicoo_dfq/resolve/main/INT8.png" width = "300" height = "300" alt="INT8" align=center />
</p>

